CATER: Combined Animal Tracking & Environment Reconstruction

Quantifying the behavior of small animals traversing long distances in complex environments is one of the most difficult tracking scenarios for computer vision. Tiny and low-contrast foreground objects have to be localized in cluttered and dynamic scenes as well as trajectories compensated for camera motion and drift in multiple lengthy recordings. We introduce CATER, a novel methodology combining an unsupervised probabilistic detection mechanism with a globally optimized environment reconstruction pipeline enabling precision behavioral quantification in natural environments. Implemented as an easy to use and highly parallelized tool, we show its application to recover fine-scale motion trajectories, registered to a high-resolution image mosaic reconstruction, of naturally foraging desert ants from unconstrained field recordings. By bridging the gap between laboratory and field experiments, we gain previously unknown insights into ant navigation with respect to motivational states, previous experience, and current environments and provide an appearance-agnostic method applicable to study the behavior of a wide range of terrestrial species under realistic conditions.

path (ZVSF trial), followed by a completely unfamiliar location approximately 180 degrees and 3-4m from their normal homing path (ZVOP trial). Note that between these trials ants had to complete an additional out and return path to the feeder to create the zero-vector condition.
Recording Hardware. For video recording an off-the-shelf camcorder was used to capture 1080p video recordings with 50 frames per second without compression (Panasonic HDC TM-900). A custom-made camera rig mounted to the camera on a 1.5m horizontal arm allowing video to be recorded from directly above the ant at a constant level height without the experimenter disturbing the foragers. To simplify the capturing process, we attached standard red laser pointers around the camera pointing to the ground providing. Thus, the experimenter simply had to keep the ant within the laser dot pattern which corresponds to the center of the image frame.

The Ant Ontogeny Dataset
We recorded the paths taken by 14 ants from their first foraging trip until they completed each of the displacement trials Figure S4 details the video data captured for every ant as they progressed from early unrewarded foraging paths (light green background), to foraging paths that extended to our 8m boundary where they were rewarded (bolder green background), before being subject to a series of displacement trials (blue background). We note that in some cases ants encountered natural prey within the 8m search space (peach background) and may have searched at more than one 8m feeding site ahead of testing (see notes column).
Notable points include: 14 ants were tracked until they had completed the ZVF trial, 13/14 also completed the subsequent ZVSF control, and 11/14 also completed the final ZVOP control; 7/14 ants returned to their first 8m feeding site on the next outward path, a further 5 returned to a 2nd site on successive outwards paths, and 2 moved onto a 3rd location; 3/14 ants found natural prey during their normal foraging; excluding these rewarded trials ants completed on average 2.2 (median=2, standard deviation=1.9) foraging paths in the local nest area before returning to the nest without reward.
For video analysis all videos were converted into image stacks using ffmpeg v4.2.4 (https://ffmpeg.org/) on Ubuntu 20.04 using the command: The tracker was run on all 151 videos. Manual positional corrections were added to 52 videos, including all 31 local search paths that did not lead the ants to natural prey or the 8m boundary, plus 8/14 ZVF trials. This ensured that the tracking position was within 1 body-length of the animal in every frame. We note that the majority of the remaining uncorrected videos are displacement trials in which they ant generally moves faster, in the open, and is carrying a cookie which improves automated tracking substantially. Indeed, analysis of the paths of the tracker for correct vs uncorrected ZVF trials shows no significant difference ( Figure S5.) In additional, contextual labels were added to many videos. Firstly, a label was added to all frames across all 151 videos indicating whether ants were carrying a food item (cookie or natural prey) or not, which we take as a proxy for motivation (foraging vs homing). For 70/151 videos (26/26 exploratory walks, 13/14 ZVF trials, and the foraging routes of 7 ants) labels were added to indicate all frames in which ants entered a bush, entered a shadow or were invisible to the camera (e.g., obscured by a bush). Finally, for all 26 videos that could be considered exploratory walks, occurrences of scans, voltes, or pirouettes (33,45) were added.

Behavioral Analysis & Results
CATER outputs both, image mosaic reconstructions of the environment and embedded trajectories of the animals. Several example routes and reconstructions can be found in Figure S6 and Section Supplementary Video provides a video demonstrating the combined tracking and reconstruction results.
Homing Performance of Ants Without Exploratory Paths Figure S7A plots the final approach (the paths within 2m from the nest) of 3 ants that reached the 8m boundary on their first foraging trip in their subsequent return journey (colored paths). The final approaches of the first return from the 8m boundary of the 9 ants that completed an average of 2.2 exploration walks before reaching the 8m boundary (black paths) and thus should have more experience of the nest area are also shown. Note that paths have been rotationally aligned to ease visual comparison of directness Figure S7B plots the straightness and duration of the final approach for these groups. The ants that did not complete any runs before reaching the 8m boundary show no clear degradation in homing performance. Figure S8 plots the search time (top row), maximum distance (middle row), and accumulated angular coverage (bottom row) for each of the exploratory paths (those that did not reach 8m) for all ants tracked. Each ant is assigned a unique color and data-points of individuals are linked to allow easily comparison and observation of trends respectively. Single data points are added to the lower plot for ants that either reached the 8m boundary on the first forage, or found natural prey to assess whether they had explored around the nest during that first long forage. The time and distance of the observed explorations paths show no clear increase over successive trips and while there is a positive trend in the accumulated angular coverage, ants typically reached the 8m boundary without exploring an area beyond 200°. Figure S9 presents data showing the difference between outward and inward route learning in ants. In all panels, each ant is assigned a unique color and data-points of individuals linked to allow easily comparison and observation of trends respectively. Figure S9 (top panel) shows the accuracy with which individuals returned to the previous feeding site. Figure S9 (2nd row) shows the similarity of successive outward (left column -dashed lines) and successive inward routes (right column -solid lines) of ants as measured using the (SSPD Score -see Section Trajectory Processing). Figure S9 (3rd row) shows the straightness of successive outward (left column -dashed lines) and successive inward routes (right column -solid lines) of ants when compared to the direct line between their feeding site and the nest. That is a perfectly straight path would score 1. Figure S9 (bottom row) shows the average speed of ants during successive outward (left column -dashed lines) and successive inward routes (right column -solid lines).

Comparison to State-of-the-Art
To demonstrate the novelty of our algorithm compared to several state-of-the-art detection algorithms as shown in Table S1. As illustrated, none of the existing algorithms is capable of addressing the three major challenges of in-field insect tracking. Moreover, we compared our system with existing image stitching object detection algorithms as shown in Fig. S11. As illustrated in Fig. S11 A, conventional image stitching algorithms fail to extract consistent image mosaics and Fig. S11 B demonstrates that background subtraction-based foreground segmentation strategies (here kNN as used in (76)) also fail to provide acceptable animal localizations. Fig. S11 C shows detection results of CATER compared to YOLOv5 (77) and Super-DiMP (78). Given the very small size of the insects, deviations >10 pixel can already lead to erroneous behavioral characteristics (see also Fig. 1 E).       S7. Homing trajectories (A) and statistics (B). Assessment of homing capabilities of ants that did (black) and did not (color) perform exploratory walks before reaching 8m.   Ants however displayed more regular oscillations in unfamiliar terrain (middle panel, significantly higher magnitude of Fourier's transform peak), but the frequency of these oscillations remained similar across conditions (right panel). Each dot represents the trial of a given ant; lines connect a same individual across conditions. Anova's F and p values are shown. Analysis was conducted only on homing ants that did not interact with bushes.  (76). Note that the moving camera setting results in erroneous foreground estimates. Bottom: The camera-motion compensated foreground extraction as used in CATER results in more reliable sparse foreground estimates. (C) Comparison of CATER to two state-of-the-art object detection mechanisms, namely YOLOv5 (77) and Super-DiMP (78). Median error (med) and median absolute deviation (mad) are given in pixel. Tables   Table S1. Comparison of different detection and tracking systems as commonly used for animal tracking with reference to the three key challenges defined. Trackers marked by an asterisk (*) indicate trackers tested with our ant data; see Fig. S11